Testing for GxE interaction
in structured populations

Andrey Ziyatdinov

May 5, 2017

GxE interaction

Biological interaction: Genetic factor(s) and environmental factor(s) participate in the same causal mechanism in the same individual (Rothman et al., 2008)


Statistical interaction using linear regression (unrelated individuals):

\(y = \mu + \beta_g x_g + \beta_e x_e + \beta_{int} x_g \times x_e + e\)

GxE interaction in GWAS context

Consortium Sample size Exposure Outcome Reference
CHARGES + SPIROMETA 50,047 Smoking Pulmonary function (Hancock et al., 2012)
SUNLIGHT 35,000 Vitamin D intake Circulating Vitamin D level (Wang et al., 2010)
GIANT up to 339,224 Gender Anthropometric traits (Heid et al., 2010)
…

Most have used the aforementioned full model (on the previous slide), but others used stratified approach (coming on the next slides).

GxE interaction in GWAS context

Example: G x smoking in pulmonary function outcomes (Hancock et al., 2012)

  • 50,047 participants from 19 studies; ~2.5M SNPs
  • outcomes: FEV1, FEV1/FVC (%)
  • saturated model for smoking: ever-smoker, current-smoker, packs-year
    • all three smoking variables tested for GxE separately
  • joint test: \(\beta_g = 0\) and \(\beta_{int} = 0\) under the null (Aschard et al., 2011)


Findings: three novel gene regions

  1. DNER
  2. HLA-DQB1 and HLA-DQA2
  3. KCNJ2 and SOX9

GxE interaction in GWAS context

Stratified GxE tests (Magi et al., 2010), (Randall et al., 2013) are widely used in meta-analysis by big consortia

Example: G x gender in the Genetic Investigation of Anthropometric Traits (GIANT) consortium

  • 14 WHR associated SNPs
  • 108,979 women
    • 42,735 (discovery) and 66,244 (follow up)
  • 82,483 men
    • 34,601 (discovery) and 47,882 (follow up)


Findings: 7 loci showed sex-specificity

  • explained variance in WHR by 14 loci
    • 1.34% in women
    • 0.46% in men

There might be others but unpublished…

because of the challenges inherant to the detection of GxE

Interesting to know, difficult to detect

Power for interaction test is much lower than for marginal

  • A plot + an example – N =10,000 – detectable effect


It also faces other potential issues (Aschard et al., 2012, HumGen):

  • Confounding
  • Exposure measurement error and misclassification
  • Dynamics of gene–environment interactions
  • …

Relatedness is yet another layer of complexity in GxE analysis

Relatedness considered in our analysis

  • shared environment: house-hold groups
  • recent genetic relatedness: family members
  • distant genetic relatedness: admixed populations

Our goal

Assess the relative performance of GxE methods
in the presence of structure

  • derive re-derive formulas for \(\hat{\beta}_{int}\) and \(var(\hat{\beta}_{int})\)
  • assess statistical power
  • guide the methodological choice: standard vs. stratified

Outline

  1. Linear mixed model in related individuals
  2. Linear mixed model and GxE
  3. GxE in admixed population
  4. lme4qtl R package

Relatedness in association tests

Methods to account for relatedness are relatively well established
in marginal association studies (GWAS)

  • Principal component analysis (PCA)
  • Linear mixed models (LMMs) (Yang et al., 2014)
  • Robust tests: genotype-conditional association test (GCAT) (Song et al., 2015, NatGen)

Study design considered

Study design 1 Study design 2 Study design 3
Sample Family-based Population-based Population-based
Relationships Kinship GRM
Method Linear mixed models Linear models Linear mixed models


GxE in study design 3 is our ongoing work (not presented today)


GxE in study designs 1 vs. 2 (today focus)

  • Compare two study designs unrelated/related
  • LMM performed using our new lme4qtl R package (under submission)

Simulation study

Given: a population of 50,000 related samples (nuclear families)

Experiment: pool 5,000 unrelated samples or pool randomly

relatedness \(V\) \(\Sigma_x\) Normalization
unrelated \(\sigma_g^2 K + \sigma_r^2 I = (\sigma_g^2 + \sigma_r^2) I\) \(\sigma_x I\) \(\sigma_g^2 + \sigma_r^2 = 1\)
genetically related \(\sigma_g^2 K + \sigma_r^2 I\) \(\sigma_x K\) \(\sigma_g^2 + \sigma_r^2 = 1\)


  • \(K\), the double kinship matrix
  • \(\Sigma_x\), the variance-covariance matrix of predictor \(x\)
  • \(\sigma_g^2\), \(\sigma_r^2\), variance proportions

Linear mixed model (also our data simulation model)

\(y = \mu + \beta_g x_g + g + f + e\)
\(\mbox{ } \mbox{ } = X \beta + g + f + e\)

\(\mbox{where } g \perp f \perp e\)

  • \(g \sim (0, \sigma_g^2 K)\), the (additive) genetic effect
  • \(h \sim (0, \sigma_f^2 F)\), the house-hold/family effect
  • \(e \sim (0, \sigma_r^2 I)\), the residual error

\(\mbox{implying}\)

\(y \sim (X \beta, \sigma_g^2 K + \sigma_f^2 F + \sigma_r^2 I) = (X \beta, V)\)

Marginal effect estimate

  1. Estimate variance components by ML/REML

\(\hat{V} = \hat{\sigma_g^2} K + \hat{\sigma_f^2} F + \hat{\sigma_r^2} I\)

  1. Derive the effect size as in Generalized Least Squares (GLS)

\(\hat{\beta}_g = (X^T \hat{V}^{-1} X)^{-1} X^T \hat{V}^{-1} Y\)

\(\hat{\beta}_g \simeq \mathcal{N}(\beta, (X^T \hat{V}^{-1} X)^{-1})\)

  1. Simplify to a one-covariate model by orthogonalization
  • the original model: \(E(Y) = \mu + \beta_g x_g\)
  • \(y^*\), centered \(y\)
  • \(x^*_g\), centered \(x_g\)

\(var(\hat{\beta}_g) = ({x^*_g}^T \hat{V}^{-1} x^*_g)^{-1}\)

Power

The power in genetic association studies with linear models is a function of the non-centrality parameter (NCP)

\(NCP \approx \beta^2 tr(\hat{V}^{-1} \Sigma_x)\)

Data Model
phenotype \(y \sim (X \beta, V) = (X \beta, \sigma_f^2 F + \sigma_g^2 K + \sigma_r^2 I)\)
genotypes \(x \sim (\mu_x, \Sigma_x)\)

Type I error rate

The standard error from a fixed effect LM applied in related individuals is not well calibrated (i.e. underestimated)

Compare formula

Confirmed the known results

But our formula allows us to explore further performances across various study designs

Conclusion

  1. GxE analysis in genetically related individuals is slightly less powerful than in unrelated
  2. GxE analysis that takes into account shared environment relatedness always increases the power
    • more variance is explained

Part 2: Linear mixed model and GxE

Relatedness in interaction tests (stratified)

  1. Compute marginal genetic effects in stratas, e.g., males and females
    • \(\beta_m\) and \(\beta_f\), the genetic effects
    • \(\sigma_{\beta_m}\) and \(\sigma_{\beta_f}\), their standard errors
  2. Combine stratified results and perform tests
    • strata-specific, interaction (differentiated), joint, heterogeneity
Stratas Stratified interaction test Reference
Idependent \(Z_{int} = \frac{\beta_m - \beta_f}{\sqrt{\sigma_{\beta_m}^2 + \sigma_{\beta_f}^2}} \sim \mathcal{N}(0, 1)\) (Magi et al., 2010)
Related \(Z_{int} = \frac{\beta_m - \beta_f}{\sqrt{\sigma_{\beta_m}^2 + \sigma_{\beta_f}^2 + r \sigma_{\beta_m} \sigma_{\beta_f}}} \sim \mathcal{N}(0, 1)\) (Randall et al., 2013)

\(r\) is the spearman correlation between the two tests

  • a naive approach that needs further investigation (Sofer et al., 2016, GenEpi)
  • methodology is similar to multi-trait meta-analysis (Zhu et al., 2016, AJHG)

Our methods

  • Simulate related individuals with GxE interactions
    • shared environments, family-based, admixed (to be done)
  • Fit data to estimate the interaction effect by linear mixed models
    • lme4qtl R package
  • Compare interaction tests when relatedness is presented
    • Compare two study designs unrelated/related

Formulas for marginal effects

Formulas for interaction effects

Thank you

Extra slides

GAIT2 Spanish families (previous project)

The Genetic Analysis of Idiopathic Thrombophilia 2 (GAIT2) Project

  • Study of Venous Thrombosis
    • disease prevalence <1%
    • heritability \(\sim\) 60%
  • 935 individuals in 35 families (27 per family on average)
  • Hundreds of phenotypes (blood coagulation system)
  • Genotype and RNA-seq data

Developed tools for analysis of family-based samples

  • solarius R package [makes SOLAR easier to use]
  • lme4qtl R package [makes lme4 flexible]

COPDgene African-Americans (current project)

COPDgene dataset

  • 3,300 admixed African-Americans
  • a set of COPD outcomes/exposures
  • SNPs, inferred local/global ancestry

Previous studies reported

  • Smoking is the major risk factor
  • African ancestry associated with increased risk of COPD

The project aims at leveraging the ancestry information in GxE tests

  • global ancestry \(\times\) exposure
  • local ancestry \(\times\) exposure
  • SNP \(\times\) exposure